Lazy-allocate error latency histogram on AggregateEntry by dougqh · Pull Request #11478 · DataDog/dd-trace-java

dougqh · 2026-05-27T18:34:39Z

Summary

Defer errorLatencies histogram allocation until the first error is recorded on an entry. Most entries never see an error in their lifetime; previously each one carried a ~60-80 byte empty DDSketchHistogram for life.
Across a full 2048-entry table, saves ~150 KB if 95% of entries never error (the typical case).
SerializingMetricWriter caches the serialized form of an empty histogram (~17 bytes) and emits those cached bytes when an entry's errorLatencies is null, so the wire format is byte-identical to before.

Background

Extracted from #11389, where the same change was bundled with cardinality- and peer-tag-related work. This PR is just the lazy-errorLatencies piece; it sits between #11382 and #11387 so it can ship without depending on the cardinality machinery in #11387.

Trade-off

Entries that do see an error retain the histogram across clear() (cleared, not nulled). An always-erroring entry allocates exactly once. Same total allocation as before for that path.

Throughput benchmarks

This is a heap-footprint change, not a CPU one — the consumer's hot path is unchanged. The bench suite was re-run anyway as a sanity check to confirm no throughput regression vs the #11382 base. Same machine state and JMH config as the rest of the stack's runs (8 producer threads, 2×15s warmup + 5×15s, 1 fork, throughput mode).

Bench (ops/s)	v1.62.0	master	#11382	this PR (#11478)
`Adversarial`	444,290 ± 1,616,937	14,276,351 ± 1,091,138	32,556,300 ± 4,321,490	30,609,314 ± 6,944,664
`HighCardinalityResource`	4,854,335 ± 1,214,233	8,168,005 ± 3,493,716	35,739,452 ± 2,556,684	34,552,088 ± 4,687,212
`HighCardinalityPeer`	6,902,209 ± 368,641	10,110,142 ± 3,380,594	37,638,634 ± 6,673,337	35,491,425 ± 4,970,576

#11478 vs #11382 is within the per-run error bar on every bench (0.94×–0.97×) — statistically indistinguishable. The CPU-side hot path didn't change: recordOneDuration now calls errorLatenciesForWrite() instead of reading a final field, but that's a single-field-load-and-branch on every entry's first error and a direct field load thereafter, which the JIT inlines flat. aggregateDropped counts are also in line with #11382, confirming the lazy field doesn't perturb the table-cap behavior.

The actual win — the ~150 KB heap reclamation at full table cap when 95% of entries never error — isn't observable in a throughput bench. It would show up in jol-based per-entry footprint inspection (one fewer histogram per entry) or in a long-running profile of allocated-bytes-per-cycle (errorLatencies allocation amortizes from "one per unique key" to "one per unique error-emitting key").

Test plan

:dd-trace-core:test — metrics tests pass
No behavior change to the client-stats wire payload

🤖 Generated with Claude Code

datadog-datadog-prod-us1-2 · 2026-05-27T18:48:24Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-java | test_smoke_graalvm: [graalvm21]

🔄 Retry job. This looks flaky and may succeed on retry.
Could not read workspace metadata from /go/src/github.com/DataDog/apm-reliability/dd-trace-java/.gradle/caches/8.14.5/groovy-dsl/002e23b2fcf63bd84058c2ef445bc284/metadata.bin due to EOFException.

Check pull requests | Check pull requests

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration.
Labels check failed: Please add at least one type and one component or instrumentation label to the pull request.

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: d99e902 | Docs | Datadog PR Page | Give us feedback!}

dd-octo-sts · 2026-05-27T18:53:10Z

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite	Status
Startup	🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Scenario	Candidate	master	Δ (95% CI of mean)
startup:insecure-bank:iast:Agent	14.01 s	14.00 s	[-1.0%; +1.2%] (no difference)
startup:insecure-bank:tracing:Agent	12.99 s	12.93 s	[-0.3%; +1.4%] (no difference)
startup:petclinic:appsec:Agent	15.75 s	16.44 s	[-12.4%; +4.0%] (unstable)
startup:petclinic:iast:Agent	16.57 s	16.66 s	[-1.8%; +0.7%] (no difference)
startup:petclinic:profiling:Agent	15.53 s	16.49 s	[-14.2%; +2.6%] (unstable)
startup:petclinic:tracing:Agent	15.91 s	15.84 s	[-1.1%; +2.0%] (no difference)

Commit: d99e9029 · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

amarziali · 2026-05-29T07:05:06Z

+    }
+  }
+
+  private byte[] emptyHistogramBytesCache;


this can be final and initialised early? I don't think it will make a big difference. Also the field should be placed up among the other fields and not among methods for clarity

Addressed — field moved up to the instance fields block. Kept lazy rather than final/eager: Histogram.newHistogram() requires the Histograms factory to be registered, and SerializingMetricWriter is constructed during tracer startup before that registration completes, so eager init would throw. Added an inline comment on the field explaining this. The single-writer invariant (aggregator thread only) means no synchronization is needed either.

(Reply by Claude Sonnet 4.6)

Another way that we could address the cycle is by placing this field inside a helper class. That would allow us to make the field final and let the reference be constant propagated.

Each AggregateEntry allocated two DDSketchHistograms in its constructor (ok + error latencies). DDSketchHistogram wraps a DDSketch + lazy store, roughly 60-80 bytes per histogram even when empty. Most spans aren't errors, so most entries' errorLatencies sit empty for life. Now the field starts null. recordOneDuration lazy-allocates on the first error; if no error ever lands on the entry, it stays null and ~80 bytes of empty-histogram overhead are reclaimed. Across a full 2048-entry table that's ~150 KB if 95% of entries never error -- the typical case. For the wire format, SerializingMetricWriter caches the serialized form of an empty histogram (~17 bytes) on first use and writes those cached bytes when an entry's errorLatencies is null. The cache is per-writer (not a global static) so each writer instance picks up the Histograms factory state at the time of its first report, avoiding races with test setup that registers the DDSketch factory at varying points. Trade-off: entries that DO see an error retain the histogram across clear() (just cleared, not nulled), so always-erroring entries allocate exactly once. Same total allocation as before for that case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dd-octo-sts · 2026-06-01T15:30:47Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

…ports - Move emptyHistogramBytesCache up to the instance fields block - Import java.nio.ByteBuffer and datadog.metrics.api.Histogram; drop FQNs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dougqh · 2026-06-01T17:14:15Z

Addressed @amarziali's two comments:

Field placement / eager init: emptyHistogramBytesCache moved up to the instance fields block. Kept lazy (not final/eager) because Histogram.newHistogram() requires the Histograms factory to be registered, and SerializingMetricWriter is constructed during tracer startup before that registration completes — eager init would throw. Added an inline comment explaining this reasoning. The single-writer invariant (aggregator thread only) means no synchronization is needed on the field either.
FQNs: Imported java.nio.ByteBuffer and datadog.metrics.api.Histogram; replaced FQNs in the method body.

(Comment written by Claude Sonnet 4.6)

dougqh · 2026-06-01T20:29:39Z

I want to collect benchmarking data before merging this. I'm still working on that.

This was referenced May 27, 2026

Per-component / tag cardinality limits in client-side stats #11387

Draft

Memory-efficiency pass on ClientStatsAggregator + adversarial benchmark #11389

Draft

Add span-derived primary tags (CSS v1.3.0) #11402

Draft

dougqh marked this pull request as ready for review May 27, 2026 19:25

dougqh requested a review from a team as a code owner May 27, 2026 19:25

dougqh requested a review from amarziali May 27, 2026 19:25

dd-octo-sts Bot added the tag: ai generated Largely based on code generated by an AI or LLM label May 27, 2026

dougqh mentioned this pull request May 28, 2026

Use MpscRingBuffer for CSS producer/consumer inbox #11497

Closed

4 tasks

amarziali approved these changes May 29, 2026

View reviewed changes

Base automatically changed from dougqh/optimize-metric-key to master May 29, 2026 15:46

dougqh force-pushed the dougqh/lazy-error-latencies branch from f2ee559 to 0c658dd Compare June 1, 2026 15:30

dougqh and others added 3 commits June 1, 2026 13:08

Address amarziali review: move emptyHistogramBytesCache field, use im…

8c92dc0

…ports - Move emptyHistogramBytesCache up to the instance fields block - Import java.nio.ByteBuffer and datadog.metrics.api.Histogram; drop FQNs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add [Claude] comment explaining lazy init on emptyHistogramBytesCache

410790c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Drop [Claude] tag from emptyHistogramBytesCache comment

2170c16

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'master' into dougqh/lazy-error-latencies

d99e902

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy-allocate error latency histogram on AggregateEntry#11478

Lazy-allocate error latency histogram on AggregateEntry#11478
dougqh wants to merge 5 commits into
masterfrom
dougqh/lazy-error-latencies

dougqh commented May 27, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1-2 Bot commented May 27, 2026 •

edited by datadog-official Bot

Loading

Uh oh!

dd-octo-sts Bot commented May 27, 2026 •

edited

Loading

Uh oh!

amarziali May 29, 2026

Uh oh!

dougqh Jun 1, 2026

Uh oh!

dougqh Jun 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

dougqh commented Jun 1, 2026

Uh oh!

dougqh commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dougqh commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Trade-off

Throughput benchmarks

Test plan

Uh oh!

datadog-datadog-prod-us1-2 Bot commented May 27, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

dd-octo-sts Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟢 Java Benchmark SLOs — All performance SLOs passed

Uh oh!

amarziali May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dougqh Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

dougqh Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dougqh commented Jun 1, 2026

Uh oh!

dougqh commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dougqh commented May 27, 2026 •

edited

Loading

datadog-datadog-prod-us1-2 Bot commented May 27, 2026 •

edited by datadog-official Bot

Loading

dd-octo-sts Bot commented May 27, 2026 •

edited

Loading

dougqh Jun 1, 2026 •

edited

Loading

dd-octo-sts Bot commented Jun 1, 2026 •

edited

Loading